Exploration on Approaches To Email ( text ) Classification CS 350 Project
نویسنده
چکیده
Basic theory about text categorization and information retrieval is presented and several important algorithms for text classification are describe in details, such as the Rocchio Algorithm, TFIDF classifiers and Naïve Byes Algorithm, etc. An implementation based on Rocchio Algorithm is also discussed and evaluated. It shows that this method is reasonably efficient given fairly small training datasets. However, in order to improve the performance of text classification algorithms and construct better ones, we should take into extended feature selection such as word sequences into consideration.
منابع مشابه
Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کاملA Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure
Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...
متن کاملA High Capacity Email Steganography Scheme using Dictionary
The main objective of steganography is to conceal a secret message within a cover-media in such a way that only the original receiver can discern the presence of the hidden message. The cover-media can be a text, email, audio, image, and video, which can be transmitted through a public channel, such as the Internet. By extending the use of email among Internet users, the provision of email steg...
متن کاملAnalysis of an Image Spam in Email Based on Content Analysis
Researchers initially have addressed the problem of spam detection as a text classification or categorization problem. However, as spammers’ continue to develop new techniques and the type of email content becomes more disparate, text-based anti-spam approaches alone are not sufficiently enough in preventing spam. In an attempt to defeat the anti-spam development technologies, spammers have rec...
متن کاملAbstract feature extraction for text classification
feature extraction for text classification Göksel BİRİCİK∗, Banu DİRİ, Ahmet Coşkun SÖNMEZ Department of Computer Engineering, Yıldız Technical University, Esenler, İstanbul-TURKEY e-mails: {goksel,banu,acsonmez}@ce.yildiz.edu.tr Received: 03.02.2011 Abstract Feature selection and extraction are frequently used solutions to overcome the curse of dimensionality in text classification problems. W...
متن کامل